NB: This document is my original work. No code nor text has been copied from any other student
pacman::p_load(psych, xray , ggplot2, texreg, DT, excelR, plotly, wrapr, sjmisc, sjlabelled, sjstats, sjPlot, dplyr, forcats, knitr, kableExtra, captioner, tidyverse, magick, stringr, Rmisc, gridExtra, bookdown, ggthemes)In a series of national surveys, CAMH and research technology company Delvinia led research to understand the mental health and substance use impacts of COVID-19, and to track changes as the pandemic unfolded [1]. The following interactive dashboard represents selected findings based on online surveys conducted over the following intervals:
All survey participants were English-speaking Canadians ages 18 and older [1].
Below are some key findings for this particular study, wave 6.
Data source: CAMH Studies Mental Health over Coronavirus Pandemic
knitr::include_graphics("https://i.imgur.com/BmTwhaW.png")Figure 1.1: The Delvinia Website
In this report, I employ the different packages in R and various functions to generate firstly the initial analysis of the data, and second the subsequent steps to prepare data for analysis. I use the the dataset provided by the 6th wave of the CAMH study to generate insights regarding:
In this report, I use RStudio to scrutinize the data and generate descriptive analyses using relevant packages. Some of the packages that I use and their functions are:
dplyr: A powerful and efficient data manipulation package that provides a set of tools for filtering, grouping, and summarizing data. It is particularly useful for working with large datasets and is widely used in data wrangling tasks.
ggplot2: A data visualization package that provides a flexible and powerful way to create a wide range of static and interactive plots. It is widely used for creating high-quality data visualizations and is particularly useful for exploratory data analysis.
pacman: An R package management tool, which provides a convenient way to install and manage R packages, similar to the package manager on Linux and MacOS. It is a wrapper around the basic functionality provided by the “install.packages()” function and provides a more user-friendly interface for installing, updating, and removing packages.
The data and the datamap for CAMH study wave 6th were imported.
camh <- read.csv("http://bus-sawtooth.mcmaster.ca/eH705_BLENDED_W2020/CAMH_wave6_nl.csv")
camh_dm <- read.csv("http://bus-sawtooth.mcmaster.ca/eH705_BLENDED_W2020/camh_Q.csv")kable(camh_dm, caption = "The datamap of the survey") %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left", fixed_thead = T) %>%
scroll_box(width = "800px", height = "500px")| Variable.ID | Question.Label |
|---|---|
| respid, | respid |
| status, | status |
| L, | Language |
| hWave, | wave tracker |
| Consent, | Consent |
| S1, | In which province or territory do you currently live? |
| hRegion, | In which province or territory do you currently live? |
| S2, | To which of the following age groups do you belong? (detailed) |
| hAge, | To which of the following age groups do you belong |
| S3, | How do you describe your gender identity? (detailed) |
| hGender, | How do you describe your gender identity? |
| Q4_1, | I have tested positive for COVID-19 |
| Q4_2, | Someone close to me has tested positive for COVID-19 |
| Q4_3, | I have had symptoms of COVID-19 but have not been tested |
| Q4_4, | Someone close to me has had symptoms of COVID-19 but has not been tested |
| Q4_5, | I have been tested for COVID-19 but it was negative |
| Q4_6, | Someone close to me has been tested for COVID-19 but it was negative |
| Q4_7, | I am elderly and/or have a health condition that increases the risk |
| Q4_8, | Someone close to me is elderly and/or has a health condition that increases the risk |
| Q4_9, | I have a job that exposes me to high risk |
| Q4_10, | Someone close to me has a job that exposes them to high risk |
| Q4_99, | No covid-19 risks |
| Q5, | How worried are you about the impact of COVID-19 on your personal financial situation? |
| Q6, | Physical distancing measures affected your employment situation? |
| Q6b, | Number of hours you are working for pay been affected |
| Q7, | How worried are you that you or someone close to you will get ill from COVID-19? |
| Q8x1, | Feeling nervous, anxious or on edge |
| Q8x2, | Not being able to stop or control worrying |
| Q8x3, | Worrying too much about different things |
| Q8x4, | Trouble relaxing |
| Q8x5, | Being so restless that it<92>s hard to sit still |
| Q8x6, | Becoming easily annoyed or irritable |
| Q8x7, | Feeling afraid as if something awful might happen |
| Q15, | During the past 7 days, on how many days did you drink ALCOHOL? |
| Q16, | On how many of the past 7 days did you drink 4 or 5 or more drinks on one occasion? |
| Q17, | In the past 7 days, did you drink more ALCOHOL, about same, or lessthan you did before the COVID-19? |
| Q18, | During the past 7 days, on how many days did you use CANNABIS? |
| Q19, | In the past 7 days, did you use CANNABIS ? |
| Q20x1, | In the past 7 days, how often have you felt depressed? |
| Q20x2, | In the past 7 days, how often have you felt lonely? |
| Q20x3, | In the past 7 days, how often have you felt hopeful about the future? |
| Q23CP_1, | Coping - Meditation or Mindfulness |
| Q23CP_2, | Coping - Physical activity |
| Q23CP_3, | Coping - Connected socially with friends or family online or over the phone |
| Q23CP_4, | Coping - Connected socially with friends or family in person |
| Q23CP_5, | Coping - Made time to relax |
| Q23CP_6, | Coping - Followed a routine |
| Q23CP_7, | Coping - Spent time outdoors |
| Q23CP_8, | Coping - Dedicated time to oneself |
| Q23CP_9, | Coping - Engaged in hobbies |
| Q23CP_10, | Coping - Sought help for mental health |
| Q23CP_11, | Coping - Volunteered or donated |
| Q23CP_12, | Coping - Engaged in spiritual practices |
| Q23, | Including yourself, how many people are currently living in your household? |
| hHouseHold, | Including yourself, how many people are currently living in your household? |
| Q24_1, | How many children in each of the following categories live in your household? |
| hChildren, | Do you have children living in your household? |
| Q25, | What is the highest level of education you have completed? |
| Q26, | What is your current marital status? |
| Q27, | Which of the following best describes your racial or ethnic group? |
| Q28, | Total household income (detailed) |
| hIncome, | Total household income |
| Q29, | Type of residence location |
str(camh)
view(camh)A brief overview shows that there are 1003 observations of 66 variables in this dataframe. I will limit my work to the following 17 variables only:
camh_select <- camh %>% select(Age_group,
Gender,
Q4_1_me_covid_test_pos,
Q4_3_me_covid_symptoms_no_test,
Q4_5_me_covid_test_neg,
Q5_fin_worry,
Q8x1_nervous_frq,
Q8x2_not_stop_worry_frq,
Q8x3_worry_diff_frq,
Q15_alcohol,
Q17_alcohol_frq,
Q20x1_depressed_frq,
Q20x2_lonely_frq,
Q20x3_hopeful_frq,
Q25_Education,
Q26_marital,
Q27_race)
head(camh_select) %>%
knitr::kable(caption = "New dataframe of only 17 variables of interest") %>%
kable_classic() %>%
scroll_box(width = "800px", height = "300px")| Age_group | Gender | Q4_1_me_covid_test_pos | Q4_3_me_covid_symptoms_no_test | Q4_5_me_covid_test_neg | Q5_fin_worry | Q8x1_nervous_frq | Q8x2_not_stop_worry_frq | Q8x3_worry_diff_frq | Q15_alcohol | Q17_alcohol_frq | Q20x1_depressed_frq | Q20x2_lonely_frq | Q20x3_hopeful_frq | Q25_Education | Q26_marital | Q27_race |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 0 | 3 | 3 | 4 | 3 | 5 | 6 | 8 |
| 3 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 2 | 3 | 2 | 3 | 2 | 4 | 1 | 8 |
| 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 5 | 1 | 2 | 1 | 3 | 1 | 8 |
| 2 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 96 | 2 | 3 | 2 | 1 | 5 | 1 | 8 |
| 3 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 4 | 0 | 5 | 1 | 1 | 2 | 2 | 1 | 8 |
| 3 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 99 | 3 | 2 | 2 | 3 | 3 | 3 | 8 |
I switched coding for Q5 - “How worried are you about the impact of COVID-19 on your personal financial situation?” as per instruction (switch coding 1->4, 2->3, 3->2, 4->1).
camh_new = camh_select %>% mutate(Q5_fin_worry=recode(Q5_fin_worry, '1' = 4, '2' = 3, '3' = 2, '4' = 1))
head(camh_new)%>%
knitr::kable(caption = "New dataframe of switched codes") %>%
kable_classic() %>%
scroll_box(width = "800px", height = "300px")| Age_group | Gender | Q4_1_me_covid_test_pos | Q4_3_me_covid_symptoms_no_test | Q4_5_me_covid_test_neg | Q5_fin_worry | Q8x1_nervous_frq | Q8x2_not_stop_worry_frq | Q8x3_worry_diff_frq | Q15_alcohol | Q17_alcohol_frq | Q20x1_depressed_frq | Q20x2_lonely_frq | Q20x3_hopeful_frq | Q25_Education | Q26_marital | Q27_race |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 1 | 1 | 1 | 4 | 2 | 2 | 2 | 0 | 3 | 3 | 4 | 3 | 5 | 6 | 8 |
| 3 | 1 | 1 | 1 | 1 | 4 | 2 | 2 | 3 | 2 | 3 | 2 | 3 | 2 | 4 | 1 | 8 |
| 1 | 1 | 1 | 1 | 1 | 4 | 2 | 2 | 2 | 2 | 5 | 1 | 2 | 1 | 3 | 1 | 8 |
| 2 | 2 | 1 | 1 | 1 | 3 | 2 | 2 | 2 | 96 | 2 | 3 | 2 | 1 | 5 | 1 | 8 |
| 3 | 2 | 1 | 1 | 1 | 3 | 1 | 1 | 4 | 0 | 5 | 1 | 1 | 2 | 2 | 1 | 8 |
| 3 | 2 | 1 | 1 | 1 | 3 | 2 | 2 | 2 | 99 | 3 | 2 | 2 | 3 | 3 | 3 | 8 |
I also rename the variable in this new dataframe to enhance clarity and comprehensiveness.
colnames(camh_new) <- c("age_group",
"gender",
"covid_positive",
"covid_no_test",
"covid_test_negative",
"financial_worry",
"nervous_frq",
"not_stop_worry_frq",
"worry_diff_frq",
"alcohol_drink",
"alcohol_frq",
"depressed_frq",
"lonely_frq",
"hopeful_frq",
"education",
"marital",
"race")
head(camh_new) %>%
knitr::kable(caption = "New dataframe of selected variables renamed") %>%
kable_classic() %>%
scroll_box(width = "800px", height = "300px")| age_group | gender | covid_positive | covid_no_test | covid_test_negative | financial_worry | nervous_frq | not_stop_worry_frq | worry_diff_frq | alcohol_drink | alcohol_frq | depressed_frq | lonely_frq | hopeful_frq | education | marital | race |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 1 | 1 | 1 | 4 | 2 | 2 | 2 | 0 | 3 | 3 | 4 | 3 | 5 | 6 | 8 |
| 3 | 1 | 1 | 1 | 1 | 4 | 2 | 2 | 3 | 2 | 3 | 2 | 3 | 2 | 4 | 1 | 8 |
| 1 | 1 | 1 | 1 | 1 | 4 | 2 | 2 | 2 | 2 | 5 | 1 | 2 | 1 | 3 | 1 | 8 |
| 2 | 2 | 1 | 1 | 1 | 3 | 2 | 2 | 2 | 96 | 2 | 3 | 2 | 1 | 5 | 1 | 8 |
| 3 | 2 | 1 | 1 | 1 | 3 | 1 | 1 | 4 | 0 | 5 | 1 | 1 | 2 | 2 | 1 | 8 |
| 3 | 2 | 1 | 1 | 1 | 3 | 2 | 2 | 2 | 99 | 3 | 2 | 2 | 3 | 3 | 3 | 8 |
Now the data is prepared, I carry on to examine the data, identify any missing values, or any errors in the collected data.
knitr::kable(describe(camh_new), caption = "Summary statistic of the dataframe") %>%
kable_classic_2()| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age_group | 1 | 1003 | 1.914257 | 0.8302094 | 2 | 1.892902 | 1.4826 | 1 | 3 | 2 | 0.1607858 | -1.5327394 | 0.0262142 |
| gender | 2 | 1003 | 1.517448 | 0.5156674 | 2 | 1.511831 | 0.0000 | 1 | 3 | 2 | 0.1048699 | -1.5731583 | 0.0162824 |
| covid_positive | 3 | 1003 | 1.006979 | 0.0832903 | 1 | 1.000000 | 0.0000 | 1 | 2 | 1 | 11.8268145 | 138.0111426 | 0.0026299 |
| covid_no_test | 4 | 1003 | 1.027916 | 0.1648151 | 1 | 1.000000 | 0.0000 | 1 | 2 | 1 | 5.7229353 | 30.7826824 | 0.0052041 |
| covid_test_negative | 5 | 1003 | 1.175474 | 0.3805614 | 1 | 1.094645 | 0.0000 | 1 | 2 | 1 | 1.7038136 | 0.9038850 | 0.0120164 |
| financial_worry | 6 | 1003 | 2.713858 | 0.9040149 | 3 | 2.767123 | 1.4826 | 1 | 4 | 3 | -0.2571560 | -0.7102377 | 0.0285447 |
| nervous_frq | 7 | 1003 | 2.000997 | 0.9864344 | 2 | 1.876712 | 1.4826 | 1 | 4 | 3 | 0.7770422 | -0.3969858 | 0.0311471 |
| not_stop_worry_frq | 8 | 1003 | 1.818544 | 0.9607959 | 2 | 1.657534 | 1.4826 | 1 | 4 | 3 | 0.9954127 | -0.0365771 | 0.0303376 |
| worry_diff_frq | 9 | 1003 | 1.988036 | 0.9643632 | 2 | 1.860523 | 1.4826 | 1 | 4 | 3 | 0.7442076 | -0.3956563 | 0.0304502 |
| alcohol_drink | 10 | 1003 | 10.163509 | 27.0889564 | 1 | 2.099626 | 1.4826 | 0 | 99 | 99 | 2.9444639 | 6.7534390 | 0.8553460 |
| alcohol_frq | 11 | 1003 | 4.184447 | 10.5602008 | 3 | 2.938979 | 0.0000 | 1 | 96 | 95 | 8.5153101 | 71.0667061 | 0.3334431 |
| depressed_frq | 12 | 1003 | 1.813559 | 0.9520100 | 2 | 1.667497 | 1.4826 | 1 | 4 | 3 | 0.9330074 | -0.1906952 | 0.0300601 |
| lonely_frq | 13 | 1003 | 1.826520 | 0.9908839 | 2 | 1.667497 | 1.4826 | 1 | 4 | 3 | 0.9235850 | -0.3350125 | 0.0312876 |
| hopeful_frq | 14 | 1003 | 2.429711 | 1.0264927 | 2 | 2.412204 | 1.4826 | 1 | 4 | 3 | 0.0427304 | -1.1416828 | 0.0324120 |
| education | 15 | 1003 | 5.257228 | 10.0462819 | 5 | 4.353674 | 0.0000 | 1 | 96 | 95 | 8.8104130 | 76.6374240 | 0.3172159 |
| marital | 16 | 1003 | 4.076770 | 11.8837134 | 1 | 2.427148 | 0.0000 | 1 | 96 | 95 | 7.3743845 | 54.1124869 | 0.3752336 |
| race | 17 | 1003 | 8.980060 | 14.7451268 | 8 | 7.069738 | 0.0000 | 1 | 97 | 96 | 5.5261981 | 29.8387681 | 0.4655840 |
camh_new_ano <- anomalies(camh_new)
knitr::kable(camh_new_ano$variables, caption = "The structure of the missing values") %>%
kable_classic_2()| Variable | q | qNA | pNA | qZero | pZero | qBlank | pBlank | qInf | pInf | qDistinct | type | anomalous_percent |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| alcohol_drink | 1003 | 0 |
|
347 | 34.6% | 0 |
|
0 |
|
10 | Integer | 34.6% |
| covid_positive | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
2 | Integer |
|
| covid_no_test | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
2 | Integer |
|
| covid_test_negative | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
2 | Integer |
|
| age_group | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
3 | Integer |
|
| gender | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
3 | Integer |
|
| financial_worry | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
4 | Numeric |
|
| nervous_frq | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
4 | Integer |
|
| not_stop_worry_frq | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
4 | Integer |
|
| worry_diff_frq | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
4 | Integer |
|
| depressed_frq | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
4 | Integer |
|
| lonely_frq | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
4 | Integer |
|
| hopeful_frq | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
4 | Integer |
|
| alcohol_frq | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
6 | Integer |
|
| education | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
6 | Integer |
|
| marital | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
7 | Integer |
|
| race | 1003 | 0 |
|
0 |
|
0 |
|
0 |
|
12 | Integer |
|
#create an empty list
plots <- list()
#create loops
for (i in colnames(camh_new)){
value <- camh_new[[i]]
variable_plot <- plot_frq(value) + labs(x = i)
plots[[i]] <- variable_plot
}
#arrange the plots together
multiplot(plotlist = plots, cols = 5, layout = NULL)Figure 4.1: Plot frequencies of the variables in the dataset
Observations on scrutinizing the data:
I change value “96” from Q15 into n/a and remove those n/a entries from the dataframe. This is only to demonstrate how to remove n/a from a dataset. Otherwise can use the subset() function to remove entries associated with the “96” value.
The new data frame from now will be referred to as “camh_clean”. Now there is only 999 observations
camh_clean <- camh_new
camh_clean$alcohol_drink[camh_clean$alcohol_drink == 96] <- NA
camh_clean <- na.omit(camh_clean)
kable(frq(camh_clean$alcohol_drink, out = "t"), caption = "Removing n/a values") %>%
kable_classic() %>% kable_styling(full_width = FALSE)
|
For some questions, there are the “Preferred not to answer” option, which coded by “96”; as well as “Not sure” coded by “97”, and they occupy less than 5% of the total observations. Therefore, I will remove observations that have a value of “96” and “97” to make the data more relevant.
Now there is only 954 observations
camh_clean2 <- subset(camh_clean, !rowSums(camh_clean == 96))
camh_clean2 <- filter(camh_clean2, !rowSums(camh_clean2 == 97))
#an example:
kable(frq(camh_clean2$race, out = "t"), caption = "Removing unnecessary data") %>%
kable_classic() %>% kable_styling(full_width = FALSE)
|
Data in the “camh_new” dataframe will be recoded, following the instructions in the new datamap, “20-3082-DATAMAP.xlsx”.
Demographic variables will be decoded directly (without creating the factor forms, as we do not need their numeric values for calculation) such as:
camh_label = camh_clean2 %>%
mutate(age_group=recode(age_group,
'1' = "18-39",
'2' = "40-59",
'3' = "60+"),
gender=recode(gender,
'1' = "Male",
'2' = "Female",
'3' = "Other"),
covid_positive=recode(covid_positive,
'1' = "No",
'2' = "Yes"),
covid_no_test=recode(covid_no_test,
'1' = "No",
'2' = "Yes"),
covid_test_negative=recode(covid_test_negative,
'1' = "No",
'2' = "Yes"),
education=recode(education,
'1' = "Did not graduate from high school",
'2' = "Completed high school",
'3' = "Post-high school education",
'4' = "College diploma/degree",
'5' = "University diploma/degree"),
marital=recode(marital,
'1' = "Married",
'2' = "Living with a partner",
'3' = "Widowed",
'4' = "Divorced",
'5' = "Separated",
'6' = "Never married"),
race=recode(race,
'1' = "Asian - East",
'2' = "Asian - South",
'3' = "Asian - South East",
'4' = "Black",
'5' = "Indigenous",
'6' = "Latin American",
'7' = "Middle Eastern",
'8' = "White",
'9' = "Mixed heritage",
'10' = "Other"))
knitr::kable(head(camh_label), caption = "Recode Demographic variables") %>%
kable_classic() %>%
scroll_box(width = "800px", height = "300px")| age_group | gender | covid_positive | covid_no_test | covid_test_negative | financial_worry | nervous_frq | not_stop_worry_frq | worry_diff_frq | alcohol_drink | alcohol_frq | depressed_frq | lonely_frq | hopeful_frq | education | marital | race |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18-39 | Female | No | No | No | 4 | 2 | 2 | 2 | 0 | 3 | 3 | 4 | 3 | University diploma/degree | Never married | White |
| 60+ | Male | No | No | No | 4 | 2 | 2 | 3 | 2 | 3 | 2 | 3 | 2 | College diploma/degree | Married | White |
| 18-39 | Male | No | No | No | 4 | 2 | 2 | 2 | 2 | 5 | 1 | 2 | 1 | Post-high school education | Married | White |
| 60+ | Female | No | No | No | 3 | 1 | 1 | 4 | 0 | 5 | 1 | 1 | 2 | Completed high school | Married | White |
| 60+ | Female | No | No | No | 3 | 2 | 2 | 2 | 99 | 3 | 2 | 2 | 3 | Post-high school education | Widowed | White |
| 18-39 | Female | No | No | Yes | 4 | 2 | 2 | 3 | 2 | 3 | 2 | 1 | 2 | University diploma/degree | Married | White |
Non-demographic variables will be decoded by creating a factor forms, namely:
camh_label$financial_worry_factored <- factor(camh_label$financial_worry,
levels = c(1,2,3,4),
labels = c("Not at all worried", "Not very worried", "Some what worried", "Very worried"))
camh_label$nervous_frq_factored <- factor(camh_label$nervous_frq,
levels = c(1,2,3,4),
labels = c("Not at all", "Several days", "Over half the days", "Nearly everyday"))
camh_label$not_stop_worry_factored <- factor(camh_label$not_stop_worry_frq,
levels = c(1,2,3,4),
labels = c("Not at all", "Several days", "Over half the days", "Nearly everyday"))
camh_label$worry_diff_factored <- factor(camh_label$worry_diff_frq,
levels = c(1,2,3,4),
labels = c("Not at all", "Several days", "Over half the days", "Nearly everyday"))
camh_label$alcohol_frq_factored <- factor(camh_label$alcohol_frq,
levels = c(1,2,3,4,5),
labels = c("Drink much more", "Drink slightly more", "No change", "Drink slightly less", "Drink much less"))
camh_label$depressed_frq_factored <- factor(camh_label$depressed_frq,
levels = c(1,2,3,4),
labels = c("Rarely or none", "Some or little", "Occasionally", "Most/All the time"))
camh_label$lonely_frq_factored <- factor(camh_label$lonely_frq,
levels = c(1,2,3,4),
labels = c("Rarely or none", "Some or little", "Occasionally", "Most/All the time"))
camh_label$hopeful_frq_factored <- factor(camh_label$hopeful_frq,
levels = c(1,2,3,4),
labels = c("Rarely or none", "Some or little", "Occasionally", "Most/All the time"))
knitr::kable(head(camh_label), caption = "Recode Non-Demographic variables") %>%
kable_classic() %>%
scroll_box(width = "800px", height = "300px")| age_group | gender | covid_positive | covid_no_test | covid_test_negative | financial_worry | nervous_frq | not_stop_worry_frq | worry_diff_frq | alcohol_drink | alcohol_frq | depressed_frq | lonely_frq | hopeful_frq | education | marital | race | financial_worry_factored | nervous_frq_factored | not_stop_worry_factored | worry_diff_factored | alcohol_frq_factored | depressed_frq_factored | lonely_frq_factored | hopeful_frq_factored |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18-39 | Female | No | No | No | 4 | 2 | 2 | 2 | 0 | 3 | 3 | 4 | 3 | University diploma/degree | Never married | White | Very worried | Several days | Several days | Several days | No change | Occasionally | Most/All the time | Occasionally |
| 60+ | Male | No | No | No | 4 | 2 | 2 | 3 | 2 | 3 | 2 | 3 | 2 | College diploma/degree | Married | White | Very worried | Several days | Several days | Over half the days | No change | Some or little | Occasionally | Some or little |
| 18-39 | Male | No | No | No | 4 | 2 | 2 | 2 | 2 | 5 | 1 | 2 | 1 | Post-high school education | Married | White | Very worried | Several days | Several days | Several days | Drink much less | Rarely or none | Some or little | Rarely or none |
| 60+ | Female | No | No | No | 3 | 1 | 1 | 4 | 0 | 5 | 1 | 1 | 2 | Completed high school | Married | White | Some what worried | Not at all | Not at all | Nearly everyday | Drink much less | Rarely or none | Rarely or none | Some or little |
| 60+ | Female | No | No | No | 3 | 2 | 2 | 2 | 99 | 3 | 2 | 2 | 3 | Post-high school education | Widowed | White | Some what worried | Several days | Several days | Several days | No change | Some or little | Some or little | Occasionally |
| 18-39 | Female | No | No | Yes | 4 | 2 | 2 | 3 | 2 | 3 | 2 | 1 | 2 | University diploma/degree | Married | White | Very worried | Several days | Several days | Over half the days | No change | Some or little | Rarely or none | Some or little |
Values in question 15 will be re-coded into 2 different categories:
camh_label$alcohol_drink_cat <- if_else(camh_label$alcohol_drink %in% c(0,1,2,3,4,5,6,7), "Drinking",
ifelse(camh_label$alcohol_drink %in% c(99), "Not drinking", NA))
sum(is.na(camh_label))[1] 0
camh_label_new <- camh_label[,-(6:14)]
knitr::kable(head(camh_label_new), caption = "Clean data set ready for analysis") %>%
kable_classic() | age_group | gender | covid_positive | covid_no_test | covid_test_negative | education | marital | race | financial_worry_factored | nervous_frq_factored | not_stop_worry_factored | worry_diff_factored | alcohol_frq_factored | depressed_frq_factored | lonely_frq_factored | hopeful_frq_factored | alcohol_drink_cat |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18-39 | Female | No | No | No | University diploma/degree | Never married | White | Very worried | Several days | Several days | Several days | No change | Occasionally | Most/All the time | Occasionally | Drinking |
| 60+ | Male | No | No | No | College diploma/degree | Married | White | Very worried | Several days | Several days | Over half the days | No change | Some or little | Occasionally | Some or little | Drinking |
| 18-39 | Male | No | No | No | Post-high school education | Married | White | Very worried | Several days | Several days | Several days | Drink much less | Rarely or none | Some or little | Rarely or none | Drinking |
| 60+ | Female | No | No | No | Completed high school | Married | White | Some what worried | Not at all | Not at all | Nearly everyday | Drink much less | Rarely or none | Rarely or none | Some or little | Drinking |
| 60+ | Female | No | No | No | Post-high school education | Widowed | White | Some what worried | Several days | Several days | Several days | No change | Some or little | Some or little | Occasionally | Not drinking |
| 18-39 | Female | No | No | Yes | University diploma/degree | Married | White | Very worried | Several days | Several days | Over half the days | No change | Some or little | Rarely or none | Some or little | Drinking |
df_1 <- camh_label_new[1:3]
demo_covid_yes <- table(subset(df_1, covid_positive == "Yes")[, c("age_group", "gender")])
demo_covid_no <- table(subset(df_1, covid_positive == "No")[, c("age_group", "gender")])
kable(demo_covid_yes, caption = "Number of respondents infected with COVID-19") %>%
kable_classic() %>%
kable_styling(full_width = T, position = "left") %>%
add_header_above(bold = TRUE, align = "c", c(" " = 2, "Infected with COVID" = 1))|
Infected with COVID
|
||
|---|---|---|
| Female | Male | |
| 18-39 | 2 | 2 |
| 40-59 | 0 | 1 |
| 60+ | 0 | 1 |
kable(demo_covid_no, caption = "Number of respondents not infected with COVID-19") %>%
kable_classic() %>%
kable_styling(full_width = T, position = "left") %>%
add_header_above(bold = TRUE, align = "c", c(" " = 2, "No COVID" = 2))|
No COVID
|
|||
|---|---|---|---|
| Female | Male | Other | |
| 18-39 | 192 | 169 | 4 |
| 40-59 | 145 | 142 | 1 |
| 60+ | 146 | 148 | 1 |
graph_1 <-
ggplot(df_1, aes(x = gender, fill = gender)) +
geom_bar() +
geom_text(aes(label = after_stat(count)), inherit.aes = TRUE, stat='count', vjust=-1) +
facet_wrap(~ covid_positive, scales = "free_x") +
labs(x = "Gender", y = "Number of respondents", fill = "Gender") +
labs(title = "COVID-19 positive rate of survey respondents by genders") +
theme_hc() + scale_fill_hc()
graph_2 <-
ggplot(df_1, aes(x = age_group, fill = age_group)) +
geom_bar() +
geom_text(aes(label = after_stat(count)), inherit.aes = TRUE, stat='count', vjust=-1) +
facet_wrap(~ covid_positive, scales = "free_x") +
labs(x = "Age groups", y = "Number of respondents", fill = "Age groups") +
labs(title = "COVID-19 positive rate of survey respondents by age groups") +
theme_hc() + scale_fill_hc()
multiplot(graph_1, graph_2, layout=matrix(c(1,2), nrow=1, byrow=TRUE))Figure 4.2: Graph of COVID-19 status of survey respondents
From Table 4.8, Table 4.9, and Figure 4.2 above, we can see that the majority of people took part in this survey were not COVID-19, be it asymptomatic or have been tested. There are people with confirmed COVID-19 infection, but their number is too small to be accounted for, at 6 people over the total of 954 responses.
In terms of genders, there is an equal number of males and females taking part in this survey, with the latter group (483) is a slightly more than the former one (459). The third group of “Other” gender made up a humble statistic of 6 people.
In terms of age groups, the youngest group of respondents who are from 18 to 39 years old was significantly larger than the other two groups, with a gap as wide as 100 respondents, making up approximately 20% - 25% of the total people in each group. This may be owing to the fact that young adults are more technical savvy than their older counterparts, thus they have a greater tendency to engage in online surveys and studies similar to the CAMH research. However, it still requires further analysis to determine whether or not this difference carries any significance.
df_2 <- camh_label_new[,c("gender","alcohol_drink_cat")]
df_2aa <- df_2[df_2$alcohol_drink_cat == "Drinking",]
df_2bb <- df_2[df_2$alcohol_drink_cat == "Not drinking",]
df_2a <- df_2aa %>%
group_by(gender) %>%
tally(name = "people") %>%
mutate(pct_a=round(people/sum(people),3)) %>%
arrange(desc(pct_a))
df_2b <- df_2bb %>%
group_by(gender) %>%
tally(name = "people") %>%
mutate(pct_b=round(people/sum(people),3)) %>%
arrange(desc(pct_b))
tab_2a <- as.data.frame(df_2a)
tab_2b <- as.data.frame(df_2b)kable(tab_2a, caption = "Respondents who drink") %>%
kable_classic() %>%
kable_styling(full_width = T, position = "left") %>%
add_header_above(bold = TRUE, align = "c", c(" " = 1, "Drinkers" = 2))|
Drinkers
|
||
|---|---|---|
| gender | people | pct_a |
| Female | 448 | 0.508 |
| Male | 429 | 0.486 |
| Other | 5 | 0.006 |
kable(tab_2b, caption = "Respondents who do not drink") %>%
kable_classic() %>%
kable_styling(full_width = T, position = "left") %>%
add_header_above(bold = TRUE, align = "c", c(" " = 1, "Non drinkers" = 2))|
Non drinkers
|
||
|---|---|---|
| gender | people | pct_b |
| Female | 37 | 0.514 |
| Male | 34 | 0.472 |
| Other | 1 | 0.014 |
df_3 <- camh_label[,c("gender","alcohol_drink","alcohol_frq_factored","nervous_frq_factored","not_stop_worry_factored","worry_diff_factored")]
df_3 <- subset(df_3, !rowSums(df_3 == 99))
df_30 <- df_3[df_3$alcohol_drink == 0,]
df_37 <- df_3[df_3$alcohol_drink == 7,]graph3 <- df_2a %>%
ggplot(aes(x="", y=pct_a, fill=gender)) +
geom_bar(stat="identity", width = 1, color="white") +
coord_polar("y", start=0) +
labs(title = "Respondents who drink alcohol") +
geom_text(aes(label=paste0(pct_a*100,"%")), position = position_stack(vjust = 0.5), size=7) +
theme_hc() + scale_fill_hc()
graph4 <- df_2b %>%
ggplot(aes(x="", y=pct_b, fill=gender)) +
geom_bar(stat="identity", width = 1, color="white") +
coord_polar("y", start=0) +
labs(title = "Respondents who do not drink alcohol") +
geom_text(aes(label=paste0(pct_b*100,"%")), position = position_stack(vjust = 0.5), size=7) +
theme_hc() + scale_fill_hc()
multiplot(graph3, graph4, layout=matrix(c(1,2), nrow=1, byrow=TRUE))Figure 4.3: Graph of drinking status of survey respondents
graph5 <-
ggplot(df_3, aes(x=alcohol_drink)) +
geom_histogram(binwidth=1, fill="#009E73") +
geom_vline(aes(xintercept=mean(alcohol_drink)), color="red", linetype="dashed") +
geom_text(aes(label = after_stat(count)), inherit.aes = TRUE, stat='count', vjust=-1) +
xlab("Days") +
ylab("Frequency") +
ggtitle("Days of drinking in the past 7 days") +
theme_hc() + scale_fill_hc()
graph6 <-
ggplot(df_3, aes(x = alcohol_frq_factored, fill = gender)) +
geom_bar() +
geom_text(aes(label = after_stat(count)), inherit.aes = TRUE, stat='count', vjust=-1) +
facet_wrap(~ gender, scales = "free_x", shrink = TRUE) +
labs(x = "Gender", y = "Number of respondents", fill = "Gender") +
labs(title = "Frequency of alcohol consumption changes by genders") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme_hc() + scale_fill_hc()
show(graph5)Figure 4.4: Graph of drinking status of survey respondents
show(graph6)Figure 4.5: Graph of drinking status of survey respondents
kable(frq(df_3$alcohol_drink, out = "t"), caption = "The structure of the number of days people drink") %>%
kable_classic() %>%
kable_styling(full_width = F, position = "left") %>%
add_header_above(bold = TRUE, align = "c")
|
show(graph7)Figure 4.6: Stress frequency among respondents
show(graph8)Figure 4.7: Stress frequency among respondents
show(graph9)Figure 4.8: Stress frequency among respondents
multiplot(graph7a, graph7b, graph7c, layout=matrix(c(1,2,3), nrow=1, byrow=TRUE))Figure 4.9: Stress levels between people who did not drink and who binged drinking
multiplot(graph8a, graph8b, graph8c, layout=matrix(c(1,2,3), nrow=1, byrow=TRUE))Figure 4.10: Stress levels between people who did not drink and who binged drinking
In conclusion, I would like to summarize my process for this analytic report:
I studied the data, the datamap, and filtered the 17 variables of interest
I evaluated n/a and anomalous values, weighing their impact against the value of the data set and subsequent analysis. For example, I maintained the value of 99 as “Not drinking” and used it for analysis. On the other hand, I removed values corresponding to option “Preferred not to answer” as they would not contribute to insights
I recoded the values using different methods - the factor form and the direct recoding - with respect to their nature, demographic or non-demographic values
I used different approaches - tables and graphs - to generate insights from the data set
Summary of findings:
[1] COVID-19 national survey dashboard. (n.d.). CAMH. https://www.camh.ca/en/health-info/mental-health-and-covid-19/covid-19-national-survey